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Dynamics of a social population is analyzed taking into account some physical constraints on 
individual behavior and decision making abilities. The model, based on Evolutionary Game Theory, 
predicts that a population has to pass through a series of different games, e.g as a consequence 
of environmental fluctuations, in order to develop social cooperation and communication skills. 
It differs from the general assumption that evolution of cooperation, the so called Cooperation 
Paradox, can be explained by a single set of rules for intra-population competitions. The developed 
methods, potentially, have a practical value for some learning optimization problems in multiagent, 
e.g. corporate, environment. 



Behavior of social animals, especially human beings, 
is characterized by controversial equilibrium between 
selfishness [jj and altruism, acting alone or joining an al- 
liance, logical thinking and irrationality 2] . Ambiguity of 
the observations complicates the separation of the field 
data into causes and consequences of the evolution of so- 
cial behavior. Moreover, the relevant original phenomena 
can be suppressed in the modern populations. There- 
fore, it is beneficial to study the emergence of sociability 
presenting evolution as a game imposed on the individ- 
ual members of a population and comparing the different 
possible evolutionary mechanisms using the Evolutionary 
Game Theory 3]. 

The main question of the theory of social evolution is, 
probably, how the "selfish" competitions for the better 
individual share of the genome pool of the subsequent 
generation led to the development of willingness to con- 
tribute to the others on the personal expense? Any act 
committed in favor of the others may contradict to the 
Darwinian survival of the fittest. Indeed, Cooperation 
Paradox emerges trying to define the main properties of 
the game corresponding to the development of social co- 
operation: the games favoring single selfish winners, like 
Prisoners Dilemma or Hawk-Dove Games, correspond to 
our notion of the reality, although predict lower level of 
cooperation than it is believed to be present in nature. 

There are three general possibilities to explain observa- 
tions of behavior that seem to be irrational from the Dar- 
winian evolution point of view. First, a help to the others 
can be justified bysome unforeseen personal interest, e.g. 
because of group0, [3, or kin (self-sacrifice in favor of 
the family) 0,0 selections, as well as due to future award 
or punishment ensured by reciprocity 0, [HI EI or altru- 
istic punishment [13 • Second, one can assume that the 
optimum of individual behavior has not been evolved yet 
or its fluctuations are significant. Third, physical con- 
straints on the possible evolutionary developments can 
cause cooperation-like behavior, e.g. enforcement of fair 
signalling by physical inability to deceive 0. The group 
selection is considered to be a weak phenomenon j^. It 
makes the physical constraints approach to be, proba- 



bly, the only one capable of providing an explanation to 
cooperation between unrelated individuals with no com- 
mon past or future, in a form of a general property of our 
world. 

In this Letter a model of an evolving social popula- 
tion is constructed, based on, probably, the oldest so- 
cial dilemma: acting alone or joining an alliance. The 
requirements for specific assumptions on the individ- 
ual decision making mechanisms and structure of intra- 
population interactions network [l4j are overcomed using 
symmetry considerations and some physical constraints, 
e.g. inability to divide resources like small amount of food 
or mating opportunities in mammals, together with lim- 
itations on the rational group decision making, discussed 
in social science and psychology |l5L ll6L ll 7T| . The current 
state of a population is demonstrated to depend on the 
whole history of the evolutionary selection rules, rather 
than the most recent ones, providing an explanation for 
the observed significant differences in social abilities of 
otherwise similar groups. In addition a population based 
on mutual reciprocal exploitation, characterized by the 
correlated mutual responses of cooperate/exploit type, is 
predicted to be the most likely evolutionary development. 
The correlated responses of this type are impossible with- 
out development of information exchange between indi- 
vidual members of the population. The population dy- 
namics and stability conditions are analyzed for a broad 
range of possible evolutionary games, including Prisoners 
Dilemma and Chicken (Snow-Drift, Hawk-Dove) Game. 
The results seem to correspond to the available qualita- 
tive experimental data[Q|. 

Quantitative description of the evolution of a popu- 
lation requires parametrization of the possible individ- 
ual responses, inter-population interactions and individ- 
ual decision making abilities. Fortunately, in biology the 
binary responses are common, for instance fight or re- 
treat, cooperate or defect, etc. Therefore, we consider a 
population composed of individuals capable of generat- 
ing only selfish 1 and cooperative 2 responses. Individu- 
als compete with each other for some resources alone or 
creating the alliances. The price and the benefits of the 
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competition are taken into account through the payoffs 
for choosing specific response against the response of the 
opponent (see Fig. 1A). The payoffs can vary with time 
favoring or suppressing the cooperation, e.g. due to the 
changes in value of the resources caused by environment 
fluctuations [lf^. 

In the presented model, each individual possesses three 
parameters to evolve: sociability e together with a and 
/?, describing individual behavior in the course of a com- 
petition. The inter-population competitions are approx- 
imated by the pair wise interactions, with only one of the 
opponents acting either individually or as a member of 
an alliance, with individual probabilities 1 — e and e cor- 
respondingly (see Fig. IB). It corresponds to the biolog- 
ical species competing for almost indivisible prizes, like 
small amounts of food or mating opportunities of mam- 
mals. In this case, each prize can be consumed only by 
the individual winner of individual/individual or individ- 
ual/alliance contests. The parameter e can be considered 
as a measure of the presence of the alliances: in a homo- 
geneous population with e = all the competitions are 
of individual/individual type, otherwise if e = 1 only in- 
dividual/alliance contests occur. 

An individual acting alone is characterized by the prob- 
abilities a and (3 to be in cooperative response 2 against 
the opponent in selfish 1 and cooperative 2 modes cor- 
respondingly (see Fig. 1C), receiving payoff according 
to the table of payoffs (see Fig. 1A). We assume that 
the members of a population have equal opportunities 
to consume specific amount of prizes during all possible 
interactions, rather than per single pairwise interaction. 
Therefore the table of payoffs describes the average to- 
tal possible payoffs of an individual for its competitions 
in non- member of an alliance mode, although it requires 
competitions with prizes of different values to the com- 
petitors. The parameters a and (3 are similar to the cor- 
relations with previous choice of the opponent [2(1 Ell . 
However, a and (3 include individual decision making 
abilities and any (rather than only memory based) de- 
tection of the intentions of the others. It corresponds to 
the biological species, especially human beings, recogniz- 
ing the intentions of the others with no common past 23]. 

An individual acting as a member of an alliance gen- 
erates random response according to the average statis- 
tics of its responses, and gets no payoff for the interac- 
tion. The random behavior is an extreme approximation 
of the possible constraints on the decision making abil- 
ities of an individual in a group 0> 0, ^ can De 
interpreted as some type of conformal behavior [2^, tak- 
ing into account that a random response, biased by the 
average behavior, is indistinguishable from the "average 



behavior" definition of social norms |25|. The no payoff 
condition takes into account the reduction of the cost of 
the defeat and the benefit of the success for the members 
of an alliance, competing together for almost indivisible 
prizes. Surprisingly, these approximations make possible 
the optimization of the benefit of the whole population 
by the selfish competition between its members. 

Evolution of a population consists of two pro- 
cesses: density redistribution between existing pheno- 
types (ai,(3i,€i) and emergence of new mutations with 
initial minimal "seed" densities. The first process is de- 
scribed by Replicator Dynamics [2^ equations: 
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where pi is the density (sometimes called frequency) of 
the individual i in the population. The individual fitness 
Fi is the the averaged payoff P{ of individual i over all 
possible interactions. To define emergence of new muta- 
tions, the (a, (3, e) space state density is assumed to be 
homogeneous and new mutations are assumed to be small 
modifications of the existing phenotvpes [2(i EH . 

The individual fitness Fi (see eq. JJJ) depends on the 
all phenotypes (<x/, (3j, ej) present in the population. Ac- 
cording to the definitions of a and (3, the average payoff 
for an individual i for the interaction with an opponent 
possessing probability 7 to be in the selfish state 1 is (see 
Fig. 1A and 1C): 

Pi(l) = ja t c - 7(1 - a,) + (1 - Pi)(l - 7)6, (2) 

where b and c are the payoffs (see Fig. 1A). Let us define 
7ij to be the average 7 of individual fy) interacting 
with individual (otj,(3j). In a population composed of 
identical individuals, self-consistent symmetry requires 
7i* = (1 - a)ju + (1 - (3){l - 7ii ), see Fig. 1C and ID. 
Consequently: 



7ii = (1 -/?)/(! + a-/3). 



(3) 



In the same fashion, 7^ = (1 — 0^)7^ + (1 — ~~ Iji)- 
Taking into account the mirror equation for 7^ one gets: 

7ij = ((l-A)-(l-ft)(a i -ft))/(l-(a i -A)(a,- j 9 j )). 

(4) 

The average 7^ corresponding to the probability to find 
an individual i in the state 1, required to describe its 
response as a member of an alliance, is derived from self- 
consistent system of equations: 
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7< = E K 1 - C J>*((1 - ^ + ( X - - 7i<)) + - «i)7j + (1 " - 7i)] , (5) 



where the averaging is performed over all interactions 
with all other individuals j, in their individual 1 — €j 
and member of an alliance ej modes. The fitness F{ is 
obtained by averaging Pi over all possible interactions, 
similar to eq. J5J): 

N 

Eqs. Q and define the evolutionary dynamics of the 
proposed model. 

Surprisingly, the development of sociability e is much 
slower process rather than the evolutionary dynamics of 
the individual behavior parameters a and (3. For all pos- 
sible games payoffs b and c, there is no relative evolu- 
tionary advantage (any difference in fitness (0) between 
two competing equipopulated subgroups, different only 
by the value of sociability: (a, /?, e) and (a, (3, e + Ae) 
(see supplementary materials). Consequently, there is no 
evolutionary preference for a change of the sociability of 
a confined population, unless its a and (3 vary with time 
(see Fig. 2A). If b and c are constant, the total change 
Ae tota i oc Acmuty/ Atf otal + Af3^ otal (Ae mut is a single 
mutation step), remaining to be small, because the pos- 
sible Acttotai and Af3 to tai are limited by « 1, until the 
convergence to one of the stable points. Therefore, the 
significant changes in e can be caused only by alternating 
with time interaction payoffs b and c. One example of the 
development of e with time is shown in Fig. 2B, although 
the general problem of optimal b(t) and c(t) remains to 
be an open question. 

The evolutionary stable values of a and (3 can be 
treated as the functions of the game payoffs b and c, to- 
gether with sociability e, due to the slow dynamics of the 
latter (see Fig. 3). Only developed sociability e = 1 en- 
sures the stability of a population based on mutual recip- 
rocal exploitation ((a = 1,/? = 0), corresponding to the 
pairwise interactions of selfish vs. cooperative type 12) 
for all range of the payoffs corresponding to the Chicken 
(Snow-Drift, Hawk-Dove) game (see Fig. 3A). Decrease 
of the value of e reduces the number of games that can 
keep population out of homogeneous states (see Fig. 3). 
Evolution of a population with correlated mutual indi- 
vidual responses (a ^ (3, see Fig. ID) is possible only 
with development of information exchange between the 
individuals^. Consequently, it can be associated with 
development of communication skills, like language or 
writing. 

Evolution of biological species based entirely on Pris- 
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oners Dilemma, according to the model, results in a self- 
ish population ((a = 0,0</3<l), pairwise interactions 

11) , see Figs. 3A and 3D. The individuals of this popu- 
lation, however, can possess an ability to recognize coop- 
eration (state 2) and cooperate in response ((3 > 0, see 
Figs. 3D, 3 A and ID). The finite level of (22) mutual re- 
sponses, (3{1— 7) 7^ 0, must be observed in the population 
with 7 < 1 and (3 > 0, playing Prisoners Dilemma for a 
period of time that is too short to change the individual 
parameters. It corresponds qualitatively to the experi- 
ments where differe nt p opulation demonstrate different 
level of cooperation l28| and degradation of cooperation 
with time is observed |l8j. 

The model predicts the optimal games for the fastest 
transition from a selfish or cooperative populations to an 
exploitative one (see Fig. 4), in case of developed so- 
ciability e = 1. The time required to develop specific 
population depends on the properties of the population 
itself and the history of the game parameters b(t) and 
c(t). It is a consequence of the topological constraints on 
the dynamics in (a, (3, e) space, defining the mutations 
that can take over the population. For instance, much 
longer time required to develop synchronous population 
((a = 0,(3= 1), pairwise interactions 11,22), rather than 
exploitative one ((a = 1,/? = 0, pairwise interactions 

12) , due to the small amount of the allowed mutations 
near the axis a = and (3 = 1 in this case (see Fig. 3). 
The synchronization of the responses, like in synchronous 
population, is common between the cells in the multicel- 
lular organisms. Therefore this prediction is intriguing, 
due to disproportional large time taken for development 
of multicellularity, relative to the other major evolution- 
ary transitions [23- The existence of the optimal game 
corresponds to our intuition that neither too harsh nor 
too soft conditions are optimal for a learning process. 

To conclude, the Cooperation Paradox was addressed 
by taking into account the advantages and disadvantages 
of being a member of an alliance, especially the possible 
constraints on the individual decision making abilities. It 
was demonstrated that a population has to pass through 
a series of different conditions, favoring and suppress- 
ing selfishness, in order to develop a robust sociability 
based on contribution to the others at the personal ex- 
pense. In case of developed sociability, an exploitative 
population, characterized by the correlated mutual re- 
sponses of cooperate/exploit (rather than homogeneous 
cooperation or selfishness) type, was shown to be sta- 
ble for a whole range of the Chicken (Hawk-Dove, Snow- 
Drift) Game. The presented method, being free from the 
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specific assumptions on the individual decision making 
mechanisms, provides a general framework for analysis of 
different hypothesis of evolution of social behavior. The 
future developments of the model can include modifica- 
tion of the inter-population interactions rules and inves- 
tigation of the possibility of the individuals affect the 
abilities of the others, e.g. evolution of the ability to 
deceive. 
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FIG. 1: The evolution of a population is defined by the 
interactions on the individual level. (A), Table of pay- 
offs Wij for the pairwise interactions. The table is reduced 
to two parameters form by subtraction of W22 and subse- 
quent normalization by \Wu — W<2i\- Two, rather than four, 
parameters significantly simplify the presentation. This nor- 
malization does not affect the stable points of the population 
dynamics (see eqs. (QJ and {J). (B), The asymmetric prob- 
ability to generate random response during an interaction e, 
interpreted as acting alone or out of an alliance. (C), Correla- 
tions a and /3 describe the individual (dashed line) ability to 
recognize the intentions of the opponent (solid line) and make 
an appropriate decision. The parameter 7 corresponds to the 
probability of finding an individual in the selfish response 1. 
(D), The 7 of a homogeneous population composed of in- 
dividuals (ot,/3, e). The points (0,1) and (1,0) describe the 
correlated populations with set of mutual responses (11,22) 
and (12) correspondingly. The condition a = f3 corresponds 
to the populations composed of individuals randomly choos- 
ing selfish or cooperative responses, disregard of the state of 
the opponent. 
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FIG. 2: Dynamics of a confined population in (a, /3, e) 
space is governed by the time dependent game pay- 
offs b(t) and c(t). (A), A population is driven by the game 
payoffs, making specific mutations to do better than the oth- 
ers. The population remains confined, while new mutations 
appear less frequent rather than the old take over the popula- 
tions according to eq. (QJ . Surprisingly, a confined population 
in (a, (3, e) space experiences first order drag, defined by the 
game weights b and c, only in (a, /3) plane. The development 
of the social behavior e is the second order process, requiring 
Act, Af3 ^ 0. (B), The development of e requires continuous 
motion away of the points where both a and (3 are constant. 
Here is an example of e(t) with Aemut = 0.05, one muta- 
tion step in time = 10, b = 1 + 2cos(27rlO _3 A s t eps ), c = 
2sin(27rl0~ 3 N s teps) and a,/3 confined to the circle with the 
center at (0.11, 0.15) and the radius 0.05. The (da/dt, dfi/dt) 
was derived numerically and used to estimate de/dt. The 
change of the game payoffs can be a consequence of both in- 
dividual evolution or environment changes. Evolution of life 
on Earth, oscillating from harsh to more soft conditions, jus- 
tifies the requirement to consider time dependent b(t) and 
c(t). 




FIG. 3: The stable points in (a,f3,e) space. (A), In 
the case e = 1 for all individuals in the population, all of 
them experience the rest of society as an individual in the 
alliance mode, described by the probability 7™ (see eq. @) 
to express response 1. Mutant j can invade the population 
of i only in case Pji^u) > Pi(^u). Using eq. (0 one can 
write A/3(l —7^)6 < Aaja(c +1), where Aa = ctj — cxi and 
Af3 = (3j — Pi. Near the axis a — and (3 = 1, for specific 
b and c, the amount of the evolutionary favorable mutations 
is converging to 0, see 7™ in Fig ID, making the dynamics 
to be very slow. Numerical simulation demonstrate the same 
behavior for all range of < e < 1. It makes the exploita- 
tive population (a = 1, /3 = 0) to be the only one that can 
be reached in a reproducible way. The exploitative popula- 
tion is more beneficial on average to its members rather than 
cooperative ((0 < a < l,/3 = 1),22) one, if (6 + c)/2 > 0. 
Otherwise, if (6 + c)/2 < 0, the exploitative population is still 
evolutionary stable, although the cooperative society is more 
beneficial. (B), In the case e = 0, the system either converges 
to a specific points on the axis (3 — and a = 1, or, otherwise, 
remains at random point on one of the edges a = or /3 = 1. 
The presented data was derived using the condition for the 
stable points on the boundaries and direct simulation of pop- 
ulations composed of up to 10 different phenotypes (a,/3, e). 
(C), The same for the case e = 0.9. (D), Comparison with 
the model games demonstrate that the population playing 
the Chicken Game (sometimes called as Hawk-Dove or Snow- 
Drift, W12 > W22 > Wn > W21), rather than Prisoner's 
Dilemma (W12 > W22 > W21 > Wn), is able to stabilize the 
exploitative population (1,0). 
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FIG. 4: Development of mutual correlation in the pop- 
ulation with e = 1 as a function of the game payoffs 

b and c. The time required for the transition from a popu- 
lation with random mutual responses to the correlated one, 
was analyzed by numerical simulation. Such transition can 
occur during the development of a common language. Fast 
mutation spread over the entire population was assumed, cor- 
responding to the cultural evolution. In the case e = 1, fol- 
lowing condition A/3(l — ju)b < Aa^u(c + 1) (see Fig. 3A), 
the (c + l)/b is the only relevant parameter and two tran- 
sition, (0,0) — > (1,0) and (1,1) — > (1,0), are identical un- 
der (c + l)/b — ► b/(c + 1) transformation. The group selec- 
tion can be introduced by requirement for growing average 
fitness of the population, predicting the optimal games at 
(c = 1.5 ± 0.01, b = 0.9 ± 0.03) and b -> oo, for the transi- 
tions (0,0) (1,0) and (1,1) (1,0) correspondingly. The 
presented results are derived using single step diffusion sim- 
ulation. The minima's search functions of GSL were used to 
find optimal values of b, c and (c + l)/b. 



